In the traditional speech recognition system, the optimal decoding paths are determined by a language model restrained by the training data. Almost inevitably, the right pronunciation may produce wrong character recognition results in some scenarios. In order to use the prosodic information in speech to enhance the probability of correct character combination in language model, an end-to-end speech recognition method based on prosodic features was proposed. Based on the attention mechanism based encoder-decoder speech recognition framework, firstly, the coefficient distribution of attention mechanism was used to extract prosodic features such as pronunciation interval and pronunciation energy. Then, the prosodic features were combined with decoder to significantly improve the accuracy of speech recognition in the cases with the same or similar pronunciation and semantic ambiguity. Experimental results show that the proposed method achieves a relative accuracy improvement of 5.2% and 5.0% respectively compared with the baseline end-to-end speech recognition method on 1 000 h and 10 000 h speech recognition tasks and improves the intelligibility of speech recognition results.